A Statistical Framework for the Prediction of Fault-Proneness
نویسندگان
چکیده
Accurate prediction of fault prone modules in software development process enables effective discovery and identification of the defects. Such prediction models are especially valuable for the large-scale systems, where verification experts need to focus their attention and resources to problem areas in the system under development. This paper presents a methodology for predicting fault prone modules using a modified random forests algorithm. Random forests improve classification accuracy by growing an ensemble of classification trees and letting them vote on the classification decision. We applied the methodology to five NASA public domain defect data sets. These data sets vary in size, but all typically contain a small number of defect samples in the learning set. For instance, in project PC1, only around 7% of the instances are defects. If overall accuracy maximization is the goal, then learning from such data usually results in a biased classifier, i.e. the majority of samples would be classified into non-defect class. To obtain better prediction of fault-proneness, two strategies are investigated: proper sampling technique in constructing the tree classifiers, and threshold adjustment in determining the winning class. Both are found to be effective in accurate prediction of fault prone modules. In addition, the paper presents a thorough and statistically sound comparison of these methods against ten other classifiers frequently used in the literature.
منابع مشابه
Evaluation of Classifiers in Software Fault-Proneness Prediction
Reliability of software counts on its fault-prone modules. This means that the less software consists of fault-prone units the more we may trust it. Therefore, if we are able to predict the number of fault-prone modules of software, it will be possible to judge the software reliability. In predicting software fault-prone modules, one of the contributing features is software metric by which one ...
متن کاملUsing Source Code Metrics and Ensemble Methods for Fault Proneness Prediction
Software fault prediction model are employed to optimize testing resource allocation by identifying fault-prone classes before testing phases. Several researchers’ have validated the use of different classification techniques to develop predictive models for fault prediction. The performance of the statistical models are proven to be influenced by the training and testing dataset. Ensemble meth...
متن کاملA particle filter and SVM integration framework for fault-proneness prediction in robot dead reckoning system
This paper proposes an integrated framework for fault prediction in the robot dead reckoning system. The integrated framework is built by particle filter and support vector machine (SVM). On the basis, the weighted fault probability parameters can be extracted to train the prediction model. Different from the traditional particle filter fault prediction model, the proposed framework can overcom...
متن کاملEmpirical Studies to Predict Fault Proneness: A Review
Empirical validations of software metrics are used to predict software quality in the past years. This paper provides a review of empirical studies to predict software fault proneness with a specific focus on techniques used. The paper highlights the milestone studies done from 1995 to 2010 in this area. Results show that use of machine learning languages have started.This paper reviews works d...
متن کاملPrediction of Change-Prone Classes Using Machine Learning and Statistical Techniques
For software development, availability of resources is limited, thereby necessitating efficient and effective utilization of resources. This can be achieved through prediction of key attributes, which affect software quality such as fault proneness, change proneness, effort, maintainability, etc. The primary aim of this chapter is to investigate the relationship between object-oriented metrics ...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2005